Autonomous animation in embodied agents

ABSTRACT

Embodiments described herein relate to the autonomous animation of Gestures by the automatic application of animations to Input Text—or the automatic application of animation Mark-up wherein the Mark-up triggers nonverbal communication expressions or Gestures. In order for an Embodied Agent&#39;s movements to come across as natural and human-like as possible, a Text-To-Gesture Algorithm (TTG Algorithm) analyses Input Text of a Communicative Utterance before it is uttered by a Embodied Agent, and marks it up with appropriate and meaningful Gestures given the meaning, context, and emotional content of Input Text and the gesturing style or personality of the Embodied Agent.

RELATED APPLICATIONS

This application claims the benefit of PCT Application PCT/IB2021/060793(filed on November 22, 21) and New Zealand Application No. 770193 (filedon Nov. 20, 2020). The entirety of each of the foregoing applications isincorporated by reference herein.

TECHNICAL FIELD

Embodiments of the invention relate to autonomous animation of EmbodiedAgents, such as virtual characters, digital entities, and/or robots.More particularly but not exclusively, embodiments of the inventionrelate to the automatic and real-time analysis of conversational contentto dynamically animate Embodied Agents.

BACKGROUND

Behaviour Mark-up Language, or BML, is an XML-based description languagefor controlling verbal and nonverbal behaviour for “EmbodiedConversational Agents”. Rule-based gesture generators, such as BEAT(SIGGRAPH '01) apply rules to generate gestures, paired with features oftext, such as key words. This results in repetitive and roboticgesturing, which is difficult to customize on a granular level. Largedatabases of rules and gestures are required. Speech-driven gesturegenerators use neural networks to generate automatic movements fromlearnt gesture and speech combinations. However, these generators oftenwork in a black-box manner, assume a general relationship between inputspeech and output motion, and have been of limited success.

U.S. Pat. No. 9,205,557B2 discloses a method for generating contextualbehaviours of a mobile robot. A module for automatically insertingcommand tags in front of key words is provided. U.S. Pat. No.9,721,373B2 discloses programs for creating a set of behaviours for lipsync movements and nonverbal communication which may include analysing acharacter's speaking behaviour with acoustic, syntactic, semantic,pragmatic, and rhetorical analyses of the utterance.

Efficient, automatic on-the-fly augmentation and/or modification ofcommunicative utterances by embodied, autonomous agents remains anunsolved problem. Further, animating Embodied Agents in a manner that isrealistic, non-repetitive and readily customizable remains an unsolvedproblem.

OBJECT OF INVENTION

It is an object of the invention to improve autonomous animation inembodied agents, or to at least provide the public or industry with auseful choice.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 shows a TTG Algorithm according to one embodiment.

FIG. 2 shows a Emphasis Detection algorithm according to one embodiment.

FIG. 3 shows an example of suitable rules weightings for EmphasisDetection.

FIG. 4 shows an example of the scoring process for Emphasis Detection.

FIG. 5 shows an Embodied Agent in a variety of different Poses.

FIG. 6 shows blending between arm Variation Poses.

FIG. 7 shows a first example of blending between hand Variation Poses.

FIG. 8 shows a second example of blending between hand Variation Poses.

FIG. 9 shows an affective response system.

FIG. 10 shows an example JSON implementation of a Mapping Matrix

FIG. 11 shows a Mapping Matrix.

FIG. 12 shows an example JSON implementation of a Mapping Matrix

FIG. 13 shows an example of a Detection System;

FIG. 14 shows an example of a Dictionary;

FIG. 15 shows an affective response system with multiple MappingSystems.

DISCLOSURE OF INVENTION

Embodied Agents, such as virtual characters, digital entities, and/orrobots may interact with a user by uttering speech from textual input inreal-time. An Embodied Agent may be a digital avatar, cartoon character,anthropomorphic avatar, etc., or may be a physical avatar e.g., physicalrobot, etc. A physical robot may include various mechanical units fordifferent parts, e.g., face part, body part, etc, enabling the physicalavatar to make various facial motions and/or body motions.

An Embodied Agent may have a face comprising at least one of eyes, nose,mouth, and may be animated to present various facial motions. The avatarmay also have one or more body parts, including at least one of a head,shoulders, hands, arms, legs, feet, etc., and may be animated to presentvarious body motions.

Text to speech (TTS) and lip animations synchronized to the speechenable such Embodied Agents to resemble human-like speech. Nonverbalcommunication, such as facial expressions and hand-gestures assist withhuman communication and bring realism and to the animation of EmbodiedAgents.

Embodiments described herein relate to the autonomous animation ofGestures by the automatic application of animations to Input Text—or theautomatic application of animation Mark-up wherein the Mark-up triggersnonverbal communication expressions or Gestures.

Text-to-Gesture

In order for an Embodied Agent's movements to come across as natural andhuman-like as possible, a Text-To-Gesture Algorithm (TTG Algorithm)analyses Input Text of a Communicative Utterance before it is uttered bya Embodied Agent, and marks it up with appropriate and meaningfulGestures given the meaning, context, and emotional content of Input Textand the gesturing style or personality of the Embodied Agent.

For example: The Input Text: “Would you like to talk about ourtechnology, or our business model?” may be processed by the TTGAlgorithm to output →“#SlightlyHappy Would you #Shrug like to #Smiletalk about our #BeatBothArmsLeft technology, or our #BeatBothArmsRightbusiness #PalmsSpread model?”

The TTG Algorithm uses Natural Language Processing (NLP) to get the bestpossible understanding of the context, meaning and communicativeintention from-the about-to-be-uttered text, in order to generate themost nuanced and natural gestures for it. The TTG Algorithm is modularand extensible, so that new and more sophisticated analysis can beadded, and existing analysis can be modified or removed easily.

Method

FIG. 1 shows a TTG Algorithm according to one embodiment.

Parsing

At step 102, Input Text 6 is received by a Parser which returns a ParseTree for each clause of the Input Text 19. Each clause is a tree, andeach node in the tree is a token, roughly equivalent to a word, and alsocontains information about the token such as its lemma, part of speechtag, and the dependency relationship with its parent node, whether it isa strong keyword, part of a list of noun phrases, etc. In oneembodiment, dependency parsing outputs a dependency tree, which providesrelationships between tokens. Any suitable dependency parsing method orsystem may be used.

Clause Analyser

At step 104, a clause analyser attaches further information about theInput Text 19 to the Parse Tree 8. The clause analyser derivesinformation about the clause and tokens, to provide as input to theMark-up Generator which generates Mark-up based on clause analysisinformation.

Clauses are analysed for semantic and syntactic patterns, keywords,emotions and dialogue acts are identified. In one embodiment, the clauseanalyser receives a dependency tree and using the dependency informationidentifies beats, negations and enumeration behaviours in the clause.Clause Analysis also attaches sentiment information to the dependencytree.

Sentiment

Any suitable machine learning or rule-based method may be used toclassify the sentiment of the clause. Clauses may be classified based onvalence (positive-neutral-negative), arousal (low-neutral-high), andfine-grained emotional content (for example: joy, sadness, anger,surprise, fear, disgust).

In one embodiment, a text sentiment analysis function is constructedusing a support vector machine (SVM). Any suitable method of textsentiment analysis may be used. The SVM may be trained usingconversational content from a specific domain. For general purposeconversation, the SVM may be trained using a broad range of domains andstyles, lengths of utterance and other parameters. Any other suitableclassifier may be used, including, but not limited to, a neural network,decision tree, a regression-based classifier, a Bayesian classifier. Adeep neural network may be suitable for classifying fine-grainedemotional content.

Word Sentiment may identify sentiment at the word level and identifywords as positive or negative. In one embodiment, a negative/positiveword dictionary is used. The valence of individual words in a clause maybe recorded. For example, in a clause with an overall positive valence,the clause analyser may identify on-negated words with positive valencewords, and non-negated words with a negative valence.

In one embodiment, sentiment-based animations are applied to sentences,based on the sentiment score. Any suitable model for sentiment analysismay be used and appropriately trained to determine a sentiment score.

Negation Scope Detection

Tokens (words) that are negated can be determined based on dependencylinks (e.g., descendants of a negation are considered to be negated bythe negation). The dependency tree structure may determine the scope ofany negation words (i.e. which words can be considered negated). Inparticular, any word that is a descendant, a sibling, or a nibling (achild of a sibling) of a negation falls within the scope of thenegation.

Enumeration

Noun chunks and phrasal verbs may be used to determine groups of words.A list of noun chunks (noun phrases) may be provided.

Phrasal verbs may be detected. In one embodiment, phrasal verbs may bedetected by an algorithm comprising the steps of: 1. Finding verbs, 2.Searching backwards for adverbs, 3. Searching forwards for adverbs andpreposition and noun phrases.

Information about groups of words may be used to drive animation. Forexample, in “Would you like a green avocado, or a brown avocado?” anembodied agent could point to the left over “green avocado” and rightover “brown avocado”, rather than treating them as individual words.

Beats may be repeated within a group. For example: “I am going onholiday tomorrow” might trigger a circle on ‘going’ and a chop on‘tomorrow’, but “I can see a big yellow fluffy giraffe” might triggerrepeated chops on ‘big’ ‘fluffy’ and ‘giraffe’.

Dialogue Act Classification

Dialogue act classification may classify dialogue acts such as listingoptions, asking a question, explaining, offering alternatives,describing, asserting, retracting, offering an opinion, apologizing,greeting, changing the subject, predicting, instructing, explaining,insulting, or teasing. In other words, dialogue act classificationclassifies what a Communicative Utterance is trying to achieve.

Dialogue act classification may be carried out using any suitableclassification method, including, but not limited to, rule-based methodsand machine learning based methods. In one embodiment, a deep learningclassifier is trained on a broad range of dialogue acts.

For questions, the grammatical mood of the Communicative Utterance maybe determined (questions tend to be in the interrogative mood), or checkit against a dictionary of ‘question’ phrases, like beginning with who,what, when, where, how, do, does. The dialogue act classifier may alsoreceive as input whether there is a question mark at the end of theclause. The dialogue act classifier may subdivide this dialogue act intodifferent kinds of questions, like asking the user about themselves orfor their opinion, asking for clarification, asking to repeat, andrhetorical questions. Advising and instructing are often in theimperative mood, or preceded by “you should” or “you could”.

For offering alternatives or contrasting ideas, it might be two clausesseparated by a conjunction such as ‘or’ or ‘but’, or two noun phrases orphrasal verbs separated by a conjunction. For example, “We couldorganise a party for him, or we could wait and see if he organises onehimself”. For listing several options or items, find a series of nounphrases or phrasal verbs separated by commas or conjunctions. Forexample, “Are you going on holiday or travelling for work?”; “You willneed a pair of 3 mm needles, 100 g of 4ply yarn in the colourway of yourchoice, and a cable needle.”

In another example, if the text is “there are many banks in New Zealand:ASB, ANZ, BNZ and Westpac.”, the intention may be classified as“enumeration”. Hypotheticals, conditionals, or counterfactuals may beindicated with the phrases “what if”, “only if”, “if . . . then . . . ”and so on.

In one embodiment, dialogue act classification can be combined withsentiment analysis to add further nuances to nonverbal communication.

Tone Classification

The tone of the content being delivered may be classified and used tomodulate behavioural performance. Examples of dimensions of tone toclassify may include serious vs. humorous, deferent vs. assertive, aloofvs. compassionate, casual vs. formal, or matter of fact vs.enthusiastic. In other words, tone classification classifies the mannerof a Communicative Utterance and may modulate gestural and emotionalperformance while the utterance is delivered accordingly.

Tone classification may be carried out using any suitableclassification, including, but not limited to, rule-based methods andmachine-learning based methods. In one embodiment, different dimensionsof tone may be classified via different machine learning classifiers. Inanother, a deep learning classifier may classify across a wide range oftonal dimensions.

Pattern Analysis

A pattern analyser finds noun phrases and series of noun phrases,phrasal verbs and series of phrasal verbs. The pattern analyser mayidentify transitive verbs by checking for a preposition and then a nounphrase, following the rest of the verb phrase. For example, “veryquickly running away from the wolf” is analysed as a phrasal verb,because the part of speech tags are, respectively, “ADVERB, ADVERB,VERB, ADVERB, PREPOSITION, DETERMINER, NOUN” (and ‘DETERMINER, NOUN’ isa noun phrase).

The pattern analyser may determine the mood, tense, verb form,adjectival form (eg superlative, comparative), person, number and othermorphological features.

Such information may be used to influence animation—for example, byincreasing the size of gestures on superlative and comparative tokens.

The “person” of a clause may influence animation by animating actionssuch that they are directed to the appropriate “person”. For example, aclause in first person may generate more speaker-directed actions, aclause in second person may generate more listener-directed actions anda clause in third person may generate undirected actions.

Tense of a clause may influence gestures, for example, by animatingclauses in the past tense with more “rigid” animations, and the futuretense with “looser” animations, representing hypotheticals.

Location Analysis

Dictionaries of positional and directional phrases may be provided: onefor each of high, low, narrow (or centre) and wide. These can be exactstring matches or pattern matches, for example “under

$NOUNPHRASE” would match to “he was under the sea”, “it was under a biground table” and “she was there under some kind of pretence” but not“they were under 18”.

Context

Contextual information from previous clauses and even previousconversation turns (both the Embodied Agent 12 and the user's) toprovide broader context for the particular clause being analysed. Forexample, if the Embodied Agent 12 is asking the user to repeatthemselves, the Embodied Agent 12 may perform the utterance slightlydifferently the second time around: with more emphasis on the key pointsor with less hesitancy. If a proper noun or other term has already beenintroduced into the conversational context, it may be less likely to bea keyword in subsequent mentions. If the current utterance is changingthe topic of conversation, there may be more (or larger or moreforceful) gestures to indicate emphasis on the new topic.

Mark-up Generator

At step 108, a Mark-up Generator uses the information in the AnalysedTree to generate Mark-up for various kinds of gestures. The AnalysedTree may comprise a Parse Tree annotated with information from ClauseAnalysis. Each of these Mark-up generators may add candidate Mark-upsbefore or after the whole clause, or before or after any individualword. Many gestures ‘hit on’ (or ‘are triggered on’) a word, by which wemean that the stroke point of the gesture (extreme point) occurs at thesame time as the stressed syllable of that word. This means that thegesture may start before the word, in order to give it time to reach itsstroke point at the moment of the stressed syllable.

Gestures include facial expressions, head and neck gestures, arm andhand gestures, and full body movement. All gestures are made up of apose and an action, where the pose is the starting point of the gestureand the action is the motion applied from that starting pose. For eachaction, the starting pose may be defined explicitly or it may bewhatever the current pose is, for example the end pose of the previousgesture.

Dialogue Act Specific Gestures

Examples of dialogue act specific gestures which may be applied include:Questions triggering shrugs and palms up outward arcs, triggering on themain verb or keyword of the clause. Negations trigger head shakes andarms crossing over or wrists flicking dismissively. Offeringalternatives map to one hand out to one side and then the other to theother, indicating weighing scales. Listing nouns or verbs as, forexample, three options, map to both arms gesturing with a chop to oneside, then both in the middle, then both to the other side (or similargestures that follow a path, such as pointing at a low level, then a bithigher, then a bit higher still). Any more than four items in a listinstead maps to counting off on the fingers.

Symbolic Gestures

Symbolic gestures are those that carry a specific meaning. The meaningmight be emblematic (the gesture stands in for a word or phrase), forexample a wave communicating a greeting; iconic (the gesture literallyrepresents the meaning of the word or phrase), for example tracing asquare shape for the word “box”; or metaphoric (the gesture representsthe meaning of the word or phrase, but not literally), for exampletracing a square shape for the word “confined”. These are triggered froma dictionary lookup for each gesture, containing emblematic, iconic andmetaphoric trigger phrases in one dictionary. The phrases in thedictionary can optionally indicate which word in the phrase the gestureshould hit on. By default, it will hit on the first word in the phrase.These phrases can also be matched to patterns rather than exact stringmatches, for example “I am Sam” matches the pattern “I am $PROPERNOUN”,but “I am hungry” does not. This kind of gesture should be appliedsparingly, otherwise it can look like they are acting out the utterance,which can come across as comical or patronising. The rate of symbolicgestures is defined in the personality/style configuration. In oneembodiment, symbolic gestures match against a universal dictionary foreach gesture.

Beats

Beat gestures emphasise words non-meaningfully (e.g. not in a symbolicway or connected to any specific dialogue act). Beats are triggered onwords in the clause as picked out by the Emphasis Detection algorithm,at a rate defined in Configuration Settings. The Action is chosen basedon the personality and gesturing style as defined in the config. Thekinds of Actions include chops (up, down, diagonal), circles, and arcingactions, all of which can be applied on a range of base arm and handPoses to produce a wide variety of gestures: from a rigid pontificatinggesture to a fluid open arcing gesture.

Thus Beats are applied to keywords as specified in the analysed tree, oftypes defined in global Configuration Settings. Each beat gestureconsists of a pose and an action, and each pose consists of arm, wrist,and hand elements.

Embodiment Gestures

Embodiment Gestures are gestures that people do in virtue of beingembodied. For example, people take a deep breath or sigh before startinga long description or explanation. In Embodied Agents, deep breaths maybe triggered before long sentences. Another example is shifting weightfrom one foot to the other, which occurs when people get tired. InEmbodied Agents, this may be triggered between (some) clauses and inother gaps. Pausing and looking up to one side to think or remembersomething, may be triggered stochastically between clauses and beforelong or very rare words, or proper nouns the first time they are used,as if trying to think of the word or name. Sometimes these areaccompanied by a furrowed brow, or a filled pause or hesitation markersuch as ‘um’. People do a wide array of grooming gestures, such asstraightening their clothes, scratching their noses, or tucking theirhair behind their ears, which are triggered in gaps with no othergestures, at a rate that is specified by the personality of theindividual.

Turn-Taking Gestures

When people pause their speech but don't intend to cede theconversational floor, they tend to look away and sometimes dofloor-retaining gestures (such as holding up a hand or a finger), orfill the pause with an ‘um’ or ‘A’. Turn-taking behaviour may betriggered at some clause boundaries and before long or rare words,including proper nouns the first time they are mentioned. When peoplehave finished speaking, to cede the floor, they (for example) makedirect eye contact and smile expectantly, sometimes also doing a ‘yourturn’ type gesture (for example, one or both hands with palms upindicating towards the conversational partner). Such gestures aretriggered at the end of their entire utterance (which may be one orseveral clauses). When a conversational partner (user) attempts tointerrupt the character, they might do a floor-retaining gesture toindicate they're not giving up the floor, or they might look a bitsurprised and stop talking and gesturing, ceding the floor to the user(how likely they are to do this may be configurable based on personalityand role). When a user is speaking, backchannel gestures are triggeredin the form of nods and smiles, frowns, ‘hmm’s and ‘uh huh’s, based onrapid sentiment analysis of interim STT results.

Poses

A pose is the starting point for a gesture: where the body moves tobefore it starts the gesture. For example, poses may include body, head,arm, wrist and finger elements. Each of these may have a base pose andsome controlled random variation added on. Each element is chosen from aset of base poses that are compatible with the chosen action (as theaction is the main part of the gesture, it is chosen first). From thesecompatible poses, the pose is chosen stochastically at frequenciesdefined by the personality, style and role config. Controlled randomvariation is obtained by blending in a small amount of a “variationpose”. These variation poses are chosen using information from thelocation analyser, as well as the sentiment scores, and if notdetermined by those, is chosen at random. The amount of the variationpose that is blended is chosen from a range specified either by thelocation analyser, sentiment modulation, or the default range (which islikely to be smaller values since it's just for adding variety, notvisibly pulling the pose in a specific direction).

Voice Modulation

Tags may be inserted to modulate the voice in order to align it betterwith the gestures chosen; the result being a coherent overallperformance of the utterance. For example, speed, pitch, and volume ofthe voice on individual words may be modified to emphasise those words.Such features may be modulated for an entire clause to change theemotional tone. For example, increasing speed and volume whiledecreasing pitch sounds more angry, decreasing all three makes it soundmore sad, etc.

Mark-Up Solver

The Mark-up solver takes the Parse Tree which has been annotated withall the candidate Mark-ups as decided by the respective Mark-upgenerators, and outputs the original text with appropriate Mark-upsadded to produce a coherent performance of that utterance to be sent tobe processed into speech and animation.

For example, some gestures can be performed together (like one headgesture and one body gesture), while others cannot. Some gestures onlymake sense to perform in conjunction with a series of other gestures(for example, if the utterance was “on one hand, A, but on the otherhand, B” it makes the most sense to do both sides of the gestures forweighing up two options, rather than doing one side and not the other).This Mark-up solver resolves these conflicts but retains connectedgestures, to build a coherent gestural performance of the utterance.

In one embodiment, for words that have at least one Mark-up tag, theMark-up Solver picks at most one body gesture and one head gesture foreach word. This may be implemented using a priority-based approach.Where there are multiple candidate gestures for a given word, gesturesmay be chosen in a predefined order of priority. In one embodiment, thefollowing order of priority is used:

-   -   Replacing existing manual tags    -   Client-override tags    -   Symbolics, but not too many    -   Dialogue acts    -   Enumerating    -   Beats    -   Turn-taking    -   Embodiment

In another embodiment, the whole clause or even the whole paragraph istaken into account, to ensure that the gestures taken as a whole formeda coherent performance. It would ensure that a series of gestures takentogether formed a sequence in a sensible or natural pattern. Forexample, a wide arcing gesture followed by one or more small chop beatsis a common sequence, but a chop then an arc then another chop is lessnatural, and a series of gestures that zigzag in space (wide, narrow,wide, narrow) tends to look unnatural unless they are zigzagging for acommunicative (symbolic) reason. It would also ensure that longer ormore significant gestures were given enough time to play out but shortergestures could be triggered in faster succession.

Pose and Action Scheme

A Pose and Action scheme independently applies Pose and Action to InputText. Independent Poses and Actions may be applied to beat gestures, orany other suitable type of Gesture.

Pose

A Pose is the dimensions and/or position of a Gesture, such as theposition of limbs. For example, the Pose of the arms of a Embodied Agent(e.g. arm positions) may be wide/narrow, or high/low.

-   -   Wide/Medium/Narrow    -   High/Medium/Low

FIG. 5 shows an Embodied Agent 12 in a variety of different Poses whilethe Embodied Agent 12 speaks a Communicative Utterance. The Input Textand Mark-up of the Communicative Utterance is as follows:

-   -   [middle_pose][strong_beats] Please place your [low_beats] ticket        [low_pose] under the [medium_beats] scanner.

The example shows how Poses and Actions may be applied at differentparts of Input Text. Once a pose is defined, all subsequent actionsstart from the defined pose.

FIG. 5A shows the Embodied Agent 12 in a wide arm/medium arm heightPose. FIG. 5B and FIG. 5C show the Embodied Agent 12 in a low arm heightPose.

Poses may be associated with a pose speed (how quickly a certain pose isreached from a neutral pose or previous pose). Poses may be associatedwith property tags, e.g.:

-   -   String name    -   Left/Right or both (referring to whether the pose is a handed        one)    -   Dimension tags. For example, Arm poses may be associated with a        width tag (e.g. whether it is a narrow, medium, or wide width        pose) and/or a height tag (whether it is a high, medium, low        height arm pose).

In one embodiment, the Embodied Agent 12 is returned to a “neutral” poseafter each action. In another embodiment, the end pose of a certainaction may become the new start pose of a new Action.

Actions

Actions refer to the movement trajectories of various feature points ina face part or a body part. Actions may be based on any suitable 3Dreconstruction techniques. For example, an Action indicating a bodymotion may be reconstructed by a set of predetermined feature points inthe body part.

Actions may be configured with suitable parameters, including, but notlimited to:

-   -   Type    -   Intensity    -   Frequency    -   Speed

one property of each action is which poses it can be applied on top of(not all combinations work, for example if the pose is already wide andthe gesture is opening the arms out wide).

Word-Token-Matching Regular Expression

A regular expression is a sequence of characters that specify a searchpattern. These patterns can be used by text-searching algorithms to findinstances of text that match the pattern. Modern regular expressionsused in computing are called ‘regex’ and typically include (but notlimited to) these operators:

-   -   Normal text characters and numbers: a-z, A-Z, 0-9, CJK        characters, spaces etc. e.g. a search pattern “a” will match the        text “cat” at the second position. E.g. A search pattern “cat”        will match the text “concatenate” at position 4.    -   ‘.’: A dot is a wildcard. It will match any character. E.g. A        search pattern “c.t” will match “cat”, “cot”, and “cut”.    -   ‘*’: An asterisk will match zero-or more of the preceding        character. E.g. A search pattern “cut*” will match zero or more        of the ‘t’ character: “cube”, “cute”, “cutting”    -   ‘+’: A plus sign will match one-or-more of the preceding        character.    -   ‘( )’: Parentheses define scope and precedence of operators.

In one embodiment, a method of text-matching operates on clause tokensinstead of individual characters.

A “token” normally corresponds to an individual word with someexceptions: “don't” resolves to two tokens representing “do” and “n′t”.Grammatical particles, such as a comma “,”, have dedicated tokens. Thesetokens encapsulate linguistic features of the text they represent asattributes, including (but not limited to):

-   -   Part-of-speech: Noun, Verb, Adjective, Punctuation etc. These        can be specified as a standard shorthand: “adjective” is “ADJ”,        “proper noun” is “PROPN” etc.    -   Detailed part-of-speech: comparative adverb, determiner, proper        singular noun, etc.    -   Lemma: The base form of the word. E.g. the lemma of “looking” is        look. The lemma of “is” is “be”.    -   Stem: The word stem (Not currently used in any form. Could be        used in the future). E.g. The stem of “fishing”, “fished”, and        “fisher” is “fish”. The stem of “argue”, “argued”, “argues”, and        “arguing” is “argu”.    -   Dependency: The syntactic dependency, or the relationship of a        token to its parent token (Tokens exist within a tree structure        and each token may have a parent or children).

Ordinary text can be converted into tokens using any suitable tool, suchas SpaCy.

This token-based text matching can be used by specifying an attribute tomatch with. For example:

-   -   “$lemma:look over there” will match “They looked over there”,        “They are looking over there”, and “They will look over there”.    -   “I am $pos:PROPN” will match proper nouns, e.g. the character        introducing themselves: “I am Sam”, “I am Rachel” etc.    -   “was $pos:ADV+excited” the ‘+’ symbol will match one-or-more of        the preceding operator (adverb). e.g. “I was really excited”, “I        was really very excited”—“really” and “very” are both adverbs in        this sentence.    -   The asterisk operator can be used similarly to match        zero-or-more: “was $pos:ADV* excited” will additionally match “I        was excited”.    -   “a. or.” the ‘.’ symbol here will match any token, unlike in        normal regular expressions where it would match a single        letter/numeral. “a. or.” could be useful for detecting when        alternatives are being presented.

Dictionary files storing lists of these search patterns may be stored.If some text matches one of the search patterns, a relevant action oremotion may be registered to be performed when that text is spoken.

Configurability

Gestures, Poses and Actions may be configurable. In one embodiment,possible configurations of Gestures, Poses and Actions are defined inGesture Configuration Settings. For example, a Gesture ConfigurationFile such as a JSON may define all Gestures, Poses and Actions, alongwith the available parameters of those Gestures, Poses and Actions.Examples of configurable parameters include:

-   -   pose intensity (what is the weighting on a particular pose)    -   gesture intensity (how pronounced or accentuated is the gesture)    -   gesture frequency (what is the probability of the gesture being        used)

In one embodiment, Gesture configurations are defined in GestureConfiguration Settings. Gesture Configuration Settings may determineavailable Gestures and ranges of motions for each type of gesture.Gestures may be “complete” gestures, meaning they include both acomplete action and pose, as opposed to being split by pose and action.

For each Gesture, Configuration Setting may include a range of movementsand configurable parameters for that Gesture. For example, acceptablevalues for the speed of an Action may be restricted between a “speedmin” and a “speed max” value. A gesture speed value may be randomlygenerated between speed min and speed max, and provided as input to“[speed,0.98]”

Gesture frequency defines the probability of a gesture being randomlyselected. Each gesture, or category of gestures, may be associated witha frequency. For example, various beat gestures may have the followingfrequencies: “chop”: 0.4, “circle”: 0.1, “small arc”: 0.5, “wide arc”:0. When a word has been identified as one that needs a gesture, anappropriate gesture may be selected based on the frequency rates.

BEAT Action Configuration Settings, for example, for a movement of anarc with palms down, may define a set of available arm poses, wristposes and hand poses are defined (as some actions are not compatiblewith some poses). The Configuration Setting also defines amplituderanges for four preset beat “strengths”, i.e. extra strong, strong,medium, or low. The Emphasis Detection algorithm described hereindetermines the “strength” of a beat for each word (if any), and theexact strength is randomly chosen within the given range. In runtime,when generating a beat gesture, a random selection may be made from eachof the available arm, wrist and hand poses. BEAT pose ConfigurationSettings may be defined for wrist poses, including variation poses forwrist poses, such as for palms up, palms down, and palms centre.

Personality Configuration—Global Configuration Setting

In one embodiment, Embodied Agents are endowed with differentpersonalities using one or more Global Configuration Settings. Globalvariables may be set which affect the expression of all Gestures. GlobalConfiguration Settings define the tendency and usage of Gestures withinpossible ranges. An Embodied Agent's personality may be configured usingGlobal Configuration Settings.

In one embodiment, a global Configuration Setting json encapsulates alllevers a character author might want to tweak to create a gesturingstyle: such as Gesture speed, Gesture height and width (average), typesof beat action, Hand poses, Wrist orientation, Excitability, hesitancyand any other suitable parameters.

In a further embodiment, the parameters in the global ConfigurationSetting may be modulated.

In one embodiment, the global Configuration Setting defines thefollowing global parameters:

Speed

The global Configuration Setting may define parameters that determinethe speed of Actions. For example, The global Configuration Setting maydetermine a minimum speed and a maximum speed for Actions. In oneembodiment, different speed parameters may be set for different types ofGestures. For example, symbolic Gestures and beat Gestures may beconfigured with different speed parameters.

Symbolic Gesture speed defines how fast the Embodied Agent moves intoSymbolic Gestures. A minimum speed and a maximum speed for moving intoSymbolic Gestures may be defined for the Embodied Agent.

Beat Gesture speed defines how fast the Embodied Agent moves into BeatGestures. A minimum speed and a maximum speed for moving into BeatGestures may be defined for the Embodied Agent.

Gesture Type

rates of different types of beat gestures may be defined. For example:

“beat_types”: {  “values”: [   {    “name”: “arc_palm_down”,    “rate”:0.2   },   ...

Gesture Frequency

The global Configuration Setting may define the frequency of certaintypes of Gestures by an Embodied Agent. For example, a maximum number ofSymbolic gestures per sentence may be defined, ensuring that theEmbodied Agent does not display too many symbolic gestures.

The global Configuration Setting may independently set the rate ofstrong gestures, medium gestures, and low gestures (which may be used tocreate variety in Beat Gestures). A weight of ‘strong’, ‘medium’ or‘low’ is placed on each emphasised word. A global configuration notrate_strong, rate_medium, rate_low defines how often gestures ofdifferent sizes are used for a personality. The sum of these threevalues is the overall gesture rate. The global Configuration Settingsets how many strong, medium, and low beats an Embodied Agent utters ina sentence.

An “emphasis” parameter changes the speed of speech based on theemphasis strength. A negative value will slow down speech. E.g.

“emphasis”: {   “tag”: “[[ speed EMPHASIS]]”,   “strong”: −0.25,  “medium”: −0.2,   “low”: −0.15  },

A “head”: configuration adds high-level (#) markup tags on emphasisedwords based on strength of emphasis and sentiment of the sentence. Thesehigh-level tags are defined in a high-level configuration file.

Sentiment threshold variables may define the range of neutral sentiment.Sentiment analysis may return a value between −1.0 (full negative) and+1.0 (full positive). Within a type of gesture, the global ConfigurationSetting may set the frequency of certain subtypes of gestures (e.g.circling actions, chopping actions etc), or even the frequency ofindividual gestures.

Pose Configuration/Gesture Dimensions

The global Configuration Setting may determine the tendencies of gesturedimensions for an Embodied Agent. For example, for Beat Gestures, theglobal Configuration Setting may define the frequency of differentposes, e.g. arm positions. In one embodiment, the global ConfigurationSetting defines what percentage of an Embodied Agent's arm positions arein a low, medium or high arm height/position, and independently defineswhat percentage of an Embodied Agent's arm positions are in a low,medium or high width from one another. There may be independentconfigurations for:

-   -   arm_positions: the rates of different arm heights and widths for        beat gestures. height(low, mid, high), width (narrow, middle,        wide, extra-wide)    -   hand_positions: the rates of different hand positions/shapes        used for beat gestures    -   hand_orientation: Embodied Agent's tendency to gesture with        palms up, centre, or down

Handedness and Symmetry

Embodied Agents may be configured to have a “handedness”, by definingthe frequency and/or strength of gestures on one hand to be greater thanthat on the other, in the Configuration Setting.

The rate of each hand for single-handed symbolic gestures may bedefined, e.g.

“handedness”: {  “values”: [   {    “name”: “left”,    “rate”: 0.5   },  ...

The rate of non-symbolic (beat) gesturing hands together, vs one or theother may be defined, e.g.

“symmetry”: {  “values”: [   {     “name”: “together”,    “rate”: 0.4  },   ...

Emotion

An emotion parameter may define how much the animation of an EmbodiedAgent is affected by emotion. An emotional_threshold parameter defineshow easily emotion affects an Embodied Agent, by defining how high asentiment score must be before a size of gesturing is increased. Apose_speed_multiplier parameter multiplies the pose speed when theemotional threshold is exceeded. An action_speed_multiplier multiplesthe action speed when the emotional threshold is exceeded. In other poseand action speed may be modified additively rather thanmultiplicatively.

A rate_multiplier may define how much the Embodied Agent's frequency ofgestures increases in response to emotion.

A size_level_offset may increase the size of gestures by a number oflevels in response to emotion.

A height_offset may define an increase in the height of gestures, and ahands spread offset may define an increase in the width of gestures.

Gesture Intervals

A gesture_interval variable may define a minimum and maximum number ofwords between gestures.

A first_gesture_offset variable may predefine the minimum number ofwords before the first gesture of a sentence. This ensures that thefirst gesture doesn't start to play before the Embodied Agent isspeaking. That is, that the gesture offset is smaller than the totaltime the Embodied Agent has been speaking.

A hesitancy variable may inject hesitancy markers, or filler words (suchas “ums” and “ahs”).

The global Configuration Setting may define parameters determining howaffected Embodied Agents are by various inputs.

For example, emotional modulation may be achieved by setting a variablewhich determines how affected an Embodied Agent is from the sentiment ofa sentence.

However, sentence sentiment is only one example of input which mayaffect the behaviour of the embodied agent. Other aspects may includeaudio input (e.g. from the agent's virtual environment or from a uservia a microphone), visual input (e.g. from the agent's virtualenvironment or from a user via a camera), input from a user interface,or any other suitable input.

The parameters within the global Configuration Setting may be associatedwith multipliers, which are set using modulatory rules. For example,action speed multipliers may be set to modulate the speed of gesturing,and rate multipliers may modulate the frequency of gestures. A sizelevel offset may increase the amplitude of gestures (resulting ingestures getting “bigger” or “smaller)”.

Randomization

By defining ranges of gesture parameters, and frequencies of gestures,the global Configuration Setting parameters affect the degree ofvariation and randomization of autonomous animation.

Modulation

At step 106, Modulation may include:

-   -   swapping out animation files (so that one individual uses eg        “wave01” and another uses “wave02” in the same place in speech);    -   using different gestures (so one individual uses “chop” and        another “circle” for emphasis);    -   increasing or decreasing speed or amplitude of gestures (S);    -   modifying the rate of gesturing (how many gestures Embodied        Agents carry out).    -   Modulation may modify the overall rates of gesturing, and/or        rates of certain types of gesturing. Rates of gesturing can be        set in a Configuration Settings, and determines how many        gestures (of various kinds) are applied to sentences.

A modulation Module may modify and/or be modified by clause analysisand/or markup generation.

Demographic Modulation creates differences in the gesturing style ofEmbodied Agents across factors like age, gender, race, and culture. Forexample, Embodied Agents portraying younger characters may be moreexpressive and less dominant than older characters. Some gestures aremeaningful only within a specific culture or may have quite differentmeanings in different cultures (even when they speak the same language).

Personality Modulation may modulate gestures to align with personalitytraits such as extroversion, introversion, confidence, friendliness,openness. These are defined in a config and map onto more fine-grainedbehavioural traits (eg high energy). The fine-grained traits map ontolow-level differences in gesture mark-ups (eg more frequent, bigger,faster gestures). These differences are implemented by using differentaverage values for gesturing rate, amplitude, and speed respectively.Further examples of personality modulation include: higher rates ofembodiment gestures for nervous or less confident personalities (theseare inserted between clauses with some probability—change theprobability to change how many they do on average); wider variety ofgestures for more expressive personalities (set the rates of eachgesture to be lower but greater than zero for many gestures, vs higherrates for a smaller number of different gestures); higher prevalence ofpalms-up, open hand, more fluid/smoother arcing gestures for friendlierand more open personalities; higher prevalence of rigid pontificatinggestures for more authoritative personalities (set a higher rate for eggestures in which the palms are up).

Style Modulation may apply idiosyncratic gesturing styles to EmbodiedAgents. Style Modulation may be more fine-grained than personalitymodulation, and define low-level gesture characteristics, such aswhether an Embodied Agent tends to gesture with a relaxed palm up handpose, or a stiff fingers spread palm down hand pose (or many otheroptions), and whether they tend to use chop actions, circling actions,fluid arcing actions etc, and whether they tend to use their left orright hand, or tend to gesture symmetrically. All of these can bedefined broadly by their personality, but they can be tweaked to givethe individual character a unique style. These are all defined in ahigh-level/personality Configuration Settings, in which the rate ofleft/right/both hands can be set, and the rate of chop gestures andcircling gestures, etc.

Role Modulation enables a single Embodied Agent to display differentgesturing behaviour depending on the role they are in at the time, evenfor the same utterance. For example, if a person is presenting an ideaat a conference talk, they will likely use different gestures to whenthey are engaging in a casual conversation, even if in both casesthey're saying the same words. Other roles may include explaining oroutlining some facts, guiding or advising, tutoring or teaching. Theparticular role that the character is playing interacts with theirpersonality and idiosyncratic style to form the resulting overallgesturing style.

Sentiment Modulation refers to using the results of sentiment analysisto trigger specific gestures, and also to modulate potentially any orall other gestures. The specific gestures might be smiles and eyebrowraises, thumbs up or clapping for pleased or happy emotions, especiallyfor expressing pleasant surprise, or frowns and clenched fists forexpressing anger or frustration. The arousal expressed in the clausealso modulates the gestures that are chosen. For example, high arousal(such as clauses expressing excitement or frustration) will mean thatthe poses (the starting points of the gestures) become wider and higher,fingers become more spread, gestures become more frequent, and actionsbecome bigger and faster.

This is achieved in two ways: first, by adding offset values to thefrequency of gestures and the amplitude and speed of each gesture. Theoffset is positive for high arousal, and negative for low arousal, andis scaled so that the higher the arousal, the higher the offset and viceversa.

Second, for the arm and hand poses, a variation pose is blended in. Forthe arms, the variation pose is the widest and highest pose (for higharousal), which is blended with the base pose to a small-medium degreeto ‘pull’ the base pose for each gesture wider and higher. For thehands, the variation pose is the fingers at maximal spread blended to asmall-medium degree, which pull the fingers slightly more spread inwhichever base pose they are in. These offsets and degrees of variationposes are configurable as part of the modulation of personality andgesturing style. For example, one character may be more expressive thananother, so highly emotional content will have a larger impact on theirgesturing behaviour.

Sentence-level emotion configuration takes the overall sentiment of asentence and applies the relevant change in emotion. Each emotion (suchas anger, concern, disgust, fear) may be connected to a dictionary(defining words triggering the emotion). For each emotion, low, mid andhigh values of the emotion may be defined, each having an intensity anda duration. The intensity of the detected emotion may be determined bysentiment analysis. A duration may define how long the emotion lasts. Anintensity multiplier define the extent to which a base emotion isnegated.

The Agent may be simulated using a neurobehavioral model (biologicallymodelled “brain” or nervous system), comprising a plurality of moduleshaving coupled computational and graphical elements. Each modulerepresents a biological process and includes a computational elementrelating to and simulating the biological process and a graphicalelement visualizing the biological process. Thus, the Agent may be“self-animated” to perform certain behaviour without external controland thus exhibit naturally occurring automatic behaviour such asbreathing, blinking, looking around, yawning, moving its lips.Biologically based autonomous animation may be achieved by modellingmultiple aspects of the nervous system, including, but not limited to,the sensory and motor systems, reflexes, perception, emotion andmodulatory systems, attention, learning and memory, rewards, decisionmaking, and goals. The use of a neurobehavioral model to animate avirtual object or digital entity is further disclosed in: Sagar, M.,Seymour, M. & Henderson, A. (2016) Creating connection with autonomousfacial animation. Communications of the ACM, 59(12), 82-91 andWO2015016723A1, also assigned to the assignee of the present inventionand is incorporated by reference herein.

The Autonomous Animation System may give and receive signals to and fromthe neurobehavioural model. Sending signals allows the sentiment andcontent of the Embodied Agent's utterances to affect their internalemotional state, which in turn may affect their underlying emotional oridle animations. Receiving signals allows external factors to affecttheir gestures, such as the character's perception of the user'semotional state or identification of objects in the field of view,allowing them to be more responsive to the user and the situation.Another example is detecting that the user is paying attention and ifnot, introduce some speech disfluency: for example, stopping andrestarting clauses.

Variation Poses

Instead of adding random variation to each particular joint (which mayresult in unnatural poses), a Variation Pose system enables the blendingbetween two or more coherent Input Poses to create a new pose VariationPose. Input Poses may be deliberately authored by an animator to blendin a coherent manner.

FIG. 6 shows blending between arm Variation Poses. FIG. 6A shows anInput Pose of a wide stance, FIG. 6B shows a Variation Pose configuredto blend with the pose of FIG. 6A. FIG. 6C shows a Blended Pose which isan intermediate pose between FIG. 6A and FIG. 6B.

FIG. 7 shows a first example of blending between hand Variation Poses.FIG. 7A shows an Input Pose of an outstretched hand, FIG. 7B shows aVariation Pose, of a folded hand, configured to blend with the pose ofFIG. 7A. FIG. 7C shows a Blended Pose which is an intermediate posebetween FIG. 7A and FIG. 7B.

FIG. 8 shows a second example of blending between hand Variation Poses.FIG. 8A shows an Input Pose of a hand with curled fingers, FIG. 8B showsa Variation Pose configured to blend with the pose of FIG. 8A. FIG. 8Cshows a Blended Pose which is an intermediate pose between FIG. 8A andFIG. 8B.

In one embodiment, the TTG System generates a Variation Pose using thefollowing steps:

-   -   Select or receive an Input Pose. In one embodiment, the Input        Pose is a “base pose”, which means it is the default pose in        which a body part of the Embodied Agent is configured.    -   Select or receive a corresponding Variation Pose, configured to        blend with the Input Pose.    -   Blend between each Input Pose and one or more Variation Poses to        generate a Blended Pose.

In one embodiment, an Input Pose and the Variation Pose are eachselected with an intensity, and blended together (e.g. 0.8 Pose 1 isblended with 0.9 Pose 2).

In another embodiment, two or more Variation Poses, configured to blendwith one another are selected, and blending weights between each of theposes is randomly generated, specifying the degree to which theVariation Poses are blended (e.g. 0.2 Pose1 is blended with 0.4 Pose2and 0.4 Pose3).

Poses selections may be restricted to be compatible with the action thatis about to come. There may be predefined a set of compatible poses foreach action from which one is chosen.

Autonomously Emotive Speech

In one embodiment, Embodied Agents are autonomous dynamic systems, withself-driven behaviour, which can also be controlled (in a weighted orcontrollable fashion) externally by the TTG System as described herein,allowing a blend of autonomy (wherein Embodied Agent gestures are drivenby their internal emotional states) and directability (wherein EmbodiedAgent gestures are driven by text as per the TTG System). “Bottom up”autonomous behaviour may be facilitated by a programming environmentsuch as that described in the patent U.S. Ser. No. 10/181,213B2 titled“System for Neurobehavioural Animation”. A plurality of Modules arearranged in a required structure and each module has at least oneVariable and is associated with at least one Connector. The connectorslink variables between modules across the structure, and the modulestogether provide a neurobehavioral model. Variables and/or Modules mayrepresent neurotransmitter/neuromodulators such as dopamine or oxytocin,which may be used to affect the operation of the structure.

The neurobehavioural model may include an emotional system as describedin the patent application PCT/IB2020/056280, ARCHITECTURE, SYSTEM, ANDMETHOD FOR SIMULATING DYNAMICS BETWEEN EMOTIONAL STATES OR BEHAVIOR FORA MAMMAL MODEL AND ARTIFICIAL NERVOUS SYSTEM, incorporated by referenceherein.

For each word carrying emotional content, the TTG System may output botha possible gesture plus one or more emotional impulses. Each emotionalimpulse perturbs the state of the internal emotional system. Theinternal emotional system is a dynamical system in flux, with emotionscompeting against each other and sustaining and decaying, providing ahistory of emotional states.

Thus, the internal emotional reaction of the Embodied Agent depends onthe content and order or sequence of the word.

In one embodiment, the TTG System may process each word sequentially andoutput one or more emotional impulses as soon as the word is processed.In another embodiment, the TTG System may process an entire clause,sentence, and/or paragraph, and output emotional impulses according toany suitable rules or analysis of the sentence.

Thus, the Autonomously Emotive Speech drives the emotional system in alayerable, blendable way with history—by the content of Input Text (e.g.key words or sentiments) affecting the internal state of the EmbodiedAgent so that emotions linger, and blend appropriately.

In one embodiment, words may be decomposed into two or more underlyingemotions. For example, the word “marvellous” can be both construed as“surprising” and “happy”, and “horrified” can be decomposed into“fear”+“disgust”. In one embodiment two or more “emotion dictionaries”each contain lists of words representing elements of a particularemotion. Words or tokens are matched against the emotion dictionaries todetermine which component emotions apply to the words or tokens.

In one embodiment, each word matched in an emotion dictionary may alsobe paired with a dictionary match variable representing the degree towhich the word is relevant to the emotional dictionary. For example, a“fear” dictionary may contain words with corresponding dictionary matchvariables as follows: horrifying 0.9, disaster 0.92, scary 0.8,uncomfortable 0.6. Both the matched emotions as well as dictionary matchvariables may be returned and provided as input to the emotion system.This provides a way of responding to complex, compound emotions in acompositional, blendable and transitional way.

Emphasis Detection

An Emphasis Detection algorithm determines the importance of words in aCommunicative Utterance, enabling an Embodied Agent to emphasise themost important words with gestures. A Emphasis Detection Algorithm mayidentify key words according to certain criteria. In one embodiment, theEmphasis Detection Algorithm identifies which words in each clause willbe given a strong, medium, low, or no emphasis.

FIG. 2 shows a Emphasis Detection algorithm according to one embodiment.At step 202, an Input Text is received. At step 204, for each “token” orword w in the Input Text, each Emphasis Detection rule is applied.Calculation of the word score may include the application of severalrules. At step 206, for each Emphasis Detection Rule, a rule score iscalculated for the relevant token or word. Emphasis Detection rules maybe weighted such that some rules have greater influence on the wordscore than others. At step 208, an overall Emphasis Score for the tokenor word is calculated. At step 210, the Emphasis Scores for each ruleare returned. The Emphasis Scores for the words are then used to applyGestures based on the Emphasis Scores

In one embodiment, the Emphasis Detection algorithm looks up therareness of each word. A look-up table of words and associated“frequencies” (of use of that word in a particular language or context)may be used to return word rareness for each word.

Words with relatively higher Emphasis Scores may “trigger a “beat”,which is a type of gesture which does not carry any speech content, butconveys non-narrative content and aligns with the rhythm of speech. TheEmphasis Detection recognises the parameters in which the keyword hasbeen defined to activate rules.

A “weight” or intensity may range between of 0-1. Weights are specifiedfor each rule. Weights may be applied in two ways: “weight” per rule and“weight” per word.

The weight of the rule remains a constant e.g. the sentiment rule isalways weighted at a value of 0.8. Meanwhile a keyword will be weighteddepending on its stated value within the corresponding dictionary e.g. Iam very excited (listed as 0.7 in the sentiment dictionary).

Multiple keywords may be identified in a given sentence and emphasizedwith beat gestures accordingly. In one embodiment, the EmphasisDetection algorithm identifies keywords in a given clause, and assignsall words high, medium, low or no emphasis based on the weighted keywordidentification algorithm. Scores are calculated for all words in asentence, then sorted in descending order. The top 10% are defined asstrong beats, following 10% as medium beats, another following 10% aslow beats. Any suitable thresholds may be provided to categorize beatsas strong, medium and/or low.

Beat Gestures may be applied to the stressed syllable such that thestroke of the beat is in sync with the stressed syllable in a word.

Rules may be combined in any suitable manner, including summing orfinding the MAX. One example of suitable rules weightings is shown inFIG. 3 . FIG. 4 shows an example of the application of rules to theinput text “John loves snorkelling in Greece”.

Emphasis Detection Fine-Tuning

The weights for the Emphasis Detection rules may be fine-tuned using,for example, a greedy algorithm or a deep learning model, onhuman-annotated data. A collection of sentences (preferably over 1500),covering various semantic domains are selected as a training dataset.Human annotators manually select the keywords (emphasis words) for eachsentence. In total 3540 sentences are used as training dataset. In oneembodiment, a plurality of annotators are used, and the conformity oftheir annotation decisions may be measured. In one experiment, theapplicants found that two human annotators agreed on 71.44% ofemphasized words. The annotations from all annotators may be used at thesame time to avoid overfitting to single annotation.

In one embodiment, the weights are fine-tuned using a greedy algorithm.A greedy algorithm is used to tweak the weights to obtain maximumaccuracy on training data. All weights are initialised randomly. At eachiteration, all weights are fixed except for one randomly chosen. It willbe tuned by searching in a 0.01 precision within [0,1] to maximize theaccuracy of training data. The algorithm terminates after 10 kiterations.

In another embodiment, a deep neural network is used to train theweights. A 1-layer fully connected feedforward network without bias oractivation is used from Keras to find the weights.

Advantages

The TTG System creates impressions of different personalities by varyingthe gesturing style of a Embodied Agent. The TTG System is highlyconfigurable. A person with an understanding of personality and bodylanguage, for example a film director, can use this system to createdifferent realistic behaviours in Embodied Agents. The person can choosethe set of gestures used, for example palm up vs palm down. They canalso adjust the speed, rates, size and location of their gesturing. Theycan specify how emotionally expressive the agent is, by configuring howthe gestures get affected by the sentiment of the sentence. All of theabove aspects influence the perceived personality of the agent.

An Action and Pose scheme is used to generate a large variety ofgestures efficiently, in a manner requiring less computational storagespace. The Action and Pose scheme also saves animator time as a largeset of animations may be generated automatically using the Action andPose scheme without requiring all variations to be manually crafted byan animator.

The system identifies the gesture types most commonly used in dialogs,including:

-   -   Symbolic gestures (iconic, metaphoric, emblematic)—Identified        based on string-matching and dictionaries. E.g., tracing a        square for the word “square”; using up gesture for “higher”.    -   Dialogue Act gestures—Identified by our rules based on        Linguistics. E.g., small shrug and open palm arc outward for        question; head shake and dismissive flick of wrist for negation;        pointing left and then right on “this or that” in “you can have        this or that”    -   Emphasizing gestures—Identified using keywords detection. E.g.,        applying a beat gesture to “really” in “this is really bad”    -   Embodiment gestures—E.g., looking up and to one side and        furrowing brow and then looking back as if to be retrieving the        term “constructivist epistemology”; shifting weight from one        foot to the other between clauses    -   Turn-taking gestures—E.g., looking away between clauses when not        finished (retaining conversational floor), looking directly at        user and smiling when finished (ceding conversational floor)

The TTG System results in more human-like autonomous animation becausethe TTG System derives linguistic information from Input Text whichhelps inform animation. The TTG System detects negations based on therelationships between words in the dependency tree that represents thesentence. The TTG System detects enumerating behaviours by finding nounphrases, verb phrases, and other patterns in the part of speech ofwords.

Variation Poses introduce natural looking randomness to the gestures.

Affective Response Modulation in Embodied Agents

Inventions disclosed herein provide a Mapping Matrix formultidimensional affective and/or emotional mapping operations, adaptingthe reaction to affective stimuli (which may be external or internal) toportray desired personality traits and/or temperament, or fit specificuse cases. Multidimensional affective and/or emotional mappingoperations may thus serve to modulate or provide additional inputs to beweighed in driving a behavioural simulation of an embodied agent, asdescribed above.

An affective response system for autonomous agents creates aconfigurable mapping from perceived emotions from stimuli to appropriateresponse emotions as part of an empathetic autonomously animated system.The capacity to configure the empathetic response allows for the easyand intuitive creation of different styles of response to convey desiredpersonality traits and tailor the emotional performance to specificuse-cases. The parameters that describe the empathetic response can bemodified dynamically to simulate mood swings or changes in the state ofmind. The input emotions can come from any emotion classification systemincluding but not limited to an NLP system or facial emotional analysissystem. The outputs can be used to drive verbal or nonverbal behaviors.This mapping is determined by one or more matrix operations, which mayuse a predefined Mapping Matrix (or plurality thereof). In oneembodiment, the Mapping Matrix, comprises a matrix of weights, eachvertical column corresponding to one per-output-emotion scalarintensities. The dimensions of the matrix are determined by the numberof available input and desired output emotions.

FIG. 9 shows an affective response system. A Detection System 906detects and/or processes input stimuli to determine Input AffectActivations. A Mapping System 907 includes a Mapping Matrix 905 whichmay be configured (for example, to correspond to a particularpersonality) using a Configuration System 908. Input Affect Activationsare processed by the Mapping Matrix 905 to generate Output AffectActivations 904. Elements of the Mapping Matrix representtransformations to be applied to corresponding Input Affect Activations.Output Affect Activations 904 are then used to animate the Agent 1.Output Affect Activation 904 may contribute to Neuron Activations 910,which drive the Expression 911 of the Agent 1.

Understanding the overall effect of the emotional configuration whenit's distributed throughout a codebase may be very challenging.Configurable Empathetic Response to External Emotional Stimuli solves orat least reduces the problem of emotion models being configured by adistributed set of constants. As the development of personality,behavior styles, characteristic emotional responses available to anAgent expands, extracting the configuration of the emotional model fromthe core codebase allows for multiple versions to co-exist and for easyswapping between versions in real-time.

The Detection System 906 may determine Input Affect Activations from anysuitable source, in any suitable manner. Inputs or stimuli may originatefrom real-world stimuli comprising for example an input from one or moreof a camera, electromagnetic transducer, audio transducer, keyboard orother known systems. Other stimuli include graphical user interfaces,hardware consoles, streamed data, and data from cloud computers,computer indexes, the world-wide web or a variety of sensors.

In one embodiment, the Input Affect Activations reflect Affects detectedfrom user input from a user interacting with the virtual character,digital entity, or robot. The method may receive, from the user via amicrophone of an electronic device of the user, speech input from theuser. FIG. 13 shows an example of a Detection System 6 configured todetermine Input Affect Activations from a user's affect, based on thespeech of a User conversing with the Agent. At 502, audio of userspeech, captured via the user's microphone is received and passed to aspeech to text system 504. The speech to text system 504 converts theaudio to user speech in text (506). The text 506 is compared against oneor more dictionaries 508, each of which is associated with anemotional/affective dimension, which match key word/s in the text toaffective categories. FIG. 10 shows an example of a Dictionaryassociated with positive sentiment. Where a match is found in anemotional/affective dictionary, the input Affect corresponding with thatemotional/affective dimension is set to 1.0 and then passed into theemotional model which applies the Affect Mapping Matrices to the inputAffects. Alternatives to dictionary-based emotional detection includerule-based NLP methods, SVM sentiment valence detection,machine-learning based models of emotional classification/sentimentanalysis.

Other sources of user emotional input may derive from user utterances,user voice (e.g. vocal tone), user facial expression, user bodylanguage. The ambience of a user's environment may also provide InputAffect Activations—for example, lighting conditions, or objects detectedin a user's background.

In other embodiments, Input Affect Activations may be determined fromcontent (e.g. digital content), such as video content, imagery, audio,or any other suitable content. Where content is associated with metadataavailable to the affective response system, content metadata may be usedto determine Input Affect Activations. For example, the keywordsassociated with Images having Alt text may be compared against adictionary as described above, or affective qualities of the Alt Textmay be determined using SVM sentiment valence detection or other methodsof emotional classification/sentiment analysis.

Where the Agent may be receiving input from multiple users (either fromthe same end-user input devices, or independent end-user input devices)an average or interpolated Input Affect Activation may be determinedfrom the multi-user input.

Audio sources such as music, background noises, speech, or other soundsmay contribute to Input Affect Activations. Where audio comprises music,any suitable music emotion recognition method may be used to determineInput Affect Activations of the music. For example Lu, L., Liu, D., andZhang, H. J. (2006). Automatic mood detection and tracking of musicaudio signals. IEEE Trans. Audio Speech Lang. Process. 14, 5-18.Discloses a Gaussian mixture model (GMM) and Bayesian classifier toclassify music emotions.

User symbolic gestures including but not limited to deictic and iconicgestures, and sign language may be classified accordingly and influenceand/or set Input Affect Activations.

In another embodiment, input stimuli may originate from the Agent'ssystem and/or be associated with the agent. For example, where the Agentis autonomously animated with a behavioural/neurobehavioural model,parameters of the Agent's own internal state can be a source of InputAffect Activations. In agents configured with a model of attention, theAvatar's attentional state can be a source of Input Affect Activations.For agents with memories, knowledge or preferences, the memories,knowledge or preferences can influence Input Affect Activations. Inconversational agents, the conversational utterances of the Agent may beprocessed using any suitable method to determine affective qualities forInput Affect Activations. Conversational utterances of Agents may beprovided by any suitable dialogue system such as IBM Watson or GoogleDialogflow. For agents simulated in a virtual environment, aspects ofthe Avatar's environment can be a source of Input Affect Activations.

FIG. 11 shows an Affect Mapping Matrix according to one embodiment. Eachcolumn corresponds to the response to an input Affect. The output Affectresponse profile is determined by that row's weights. FIG. 10 shows anexample implementation of a Mapping Matrix for personality-based emotionmapping (“Weight Matrix”), wherein the Mapping Matrix is defined usingJSON. The Weight Matrix contains eight input emotions (anger, concern,disgust, fear, sad, happy, surprise, and interest). Each input emotionhas a set of output emotion weights. For example, the emotional outputwhen anger is detected in the input is comprised of anger, concern,fear, shame, and negative joy. The negative weight for joy is used toremove any lingering activations from previous emotions. This improvesthe speed, clarity, and appropriateness of responses. Every weight notexplicitly set in the JSON configuration is set to 0 by default. Eachrow of the Mapping Matrix represents the response each input emotionwill elicit. For example, an Input Affect Activation of “anger”:{“anger”: 0.3, “joy”: —0.5} will cause a small spike in output Affectanger and a larger negative spike in output Affect joy.

FIG. 12 shows an example implementation of a Mapping Matrix forpersonality-based emotion intensity configuration (“Intensity Matrix”).The intensities may conveniently support scaling elements of theemotional response. This is one way to create subtle variation between,for example, a mildly happy and an extremely happy personality. The baseweight matrices could be identical and just change the intensity of‘joy’. Whilst this invention describes a “Matrix”, in some embodiments,mathematically equivalent objects to Matrices may replace the AffectiveMapping Matrix, such as collections of vectors, systems of linearequations or graphs.

Any suitable set affect categories may be defined. In the example shownin FIG. 12 , Input Affect Activations include the discrete affectivecategories of Anger, concern, disgust, fear, sadness, happiness,surprise and interest. Output emotion categories are anger, concern,disgust, fear, sadness, joy, surprise, interest, shame, care andexcitement. The set of Input Affect Activations may (but do notnecessarily) need to correspond to Output Affect Activations.

In one embodiment, Input Affect Activations (or input affects) are amatrix/vector of binary values, however the invention is not limited inthis respect. Input Affect Activations may be weighted and may comprisecontinuous (as opposed to binary) positive/negative values associatedwith each affect/emotion dimension. Input Affect Activations mayrepresent the emotional/affective contributions for each emotion/affectfrom a plurality of emotional/affective dimensions or categories. Forexample, given the three emotional dimensions [happy, sad, angry], aninput affect activation of [0,0,1] represents a fully “angry” inputaffect activation and an input affect activation of [0,0.5,0.5] mayrepresent an equally sad and equally angry input affect activation.

In one embodiment, Mapping Matrix values are populated from aConfiguration System. In one embodiment, the Configuration Systemincludes a “configuration file” defining a set of Mapping Matrices, eachcorresponding to a personality, and the Mapping Matrix values arereplaced with those defined by the selected personality.

In other embodiments, the Configuration System may include functions fordynamically varying Mapping Matrix parameters, according to certainhyper-parameters. Matrix variables could be trained via a machinelearning method, wherein the training data are emotional inputs andresponses from participants in human-to-human interactions.

Multiple matrix operations may be applied sequentially or in parallel toInput Affect Activations to generate Output Affect Activations. In oneembodiment, Multiple matrix operations may be applied for a plurality ofInput Affect Activations from the multiple sources. For example, a firstmatrix for user speech, and a second matrix for user facial emotions.

A plurality of matrix operations (e.g. sequential operations) may beimplemented programmatically, in particular, matrix multiplication andapplication of per emotion scalar intensities, e.g.:

outputEmotions=weightMatrix*inputEmotions

outputEmotions[1]=outputEmotions[1]*intensities[1]

Input_emotions is of dimension [n×1], weight_matrix is of dimension[m×n], and output emotions is of dimension [m×1], wheren=num_input_emotions and m=num_output_emotions]. For example, after thepersonality-based emotion mapping using the Mapping Matrix forpersonality-based emotion mapping (shown in FIG. 11 ), the output fromthat operation is scaled by its corresponding intensity using theMapping Matrix for personality-based emotion intensity configuration(FIG. 12 ). The equation comprising inputs i, Weights W, intensities bmay be defined as follows to produce outputs y:

Y _(i)=(W _(x))_(i) *b _(i)

In FIG. 12 the intensities vector lists the available output emotionsand their corresponding intensities. The output Affects are produced byscaling the product of all previous matrix operations by theircorresponding intensity. This can be used to enhance the characteristicemotional responses. The example weight matrix and intensities vector inFIG. 8 matrix values are configured to convey a general use, balanced,appropriate, empathetic response, configured to fill in the social roleof a polite acquaintance, perhaps in a customer service type role. Eachinput emotion is acknowledged and modulated to show that some processinghas been done prior to responding. For example, the response to ‘anger’,is mostly concern and joy negation, with some additional anger, fear,and a tiny amount of shame. In this use case, it would not be sociallyacceptable to get angry back at a user if they were getting angry. Someother options could be to have predominantly fear and sad, for more of atimid personality, happy and disgust for a more sassy and rudepersonality, or something more subtle. Depending on the situation andgoal of the interaction, different response compositions will beappropriate.

Configuring the matrix and intensities to convey personality traits isreasonably simple, as it can be done by changing Mapping Matrix values.The equation used to calculate output Affects need not be linear as inthe example above, it could be any linear or non-linear equationapplying the Mapping Matrices to the input Affects.

Mapping matrices configured for various personalities/temperaments canbe interpolated. For example, given Matrix A, representing a firstpersonality profile [e.g. shy], and Matrix B, representing a secondpersonality [e.g. ambitious]— an interpolation between Matrix A andMatrix B can be given by the matrix operation:

αA+(1−a)B

Where α is the interpolation parameter. In general M matrices could beinterpolated between with N-1 interpolation parameters, all of which sumto 1, or alternatively by statistical methods. Matrices can betemporarily interpolated to simulate mood swings or changes toempathetic behaviour due to other physical or physiological stimuli.Alternatively, a user-friendly method of precise personalityconfiguration can be provided by allowing users to finely interpolatebetween any personality profile, as determined by a personality matrix.

Matrix interpolation can be done through linear blending on the matricesdirectly or logarithmic transform of the matrices (i.e. interpolating inLie Subgroups). Conversely, the matrices can be projected onto a latentspace using techniques such as Principal Component Analysis, LinearDiscriminant Analysis, Autoencoder etc, where interpolation can beperformed on the latent variables.

An identity matrix could be used to create direct mimicry. This can beused to create a simple, primary/emotional empathy system where theoutput Affects are identical to the input Affects. The identity matrixcould be interpolated with another matrix to provide degrees of mimicry.

The Mapping Matrix defines the empathetic response to external stimuli,and therefore variations of the Mapping Matrix can be created to mimicdifferent personalities and temperaments. In addition, these mappingmatrices can be created as a function of output emotion or mood,resulting in a dynamic feedback system that simulates homeostaticregulation. This homeostatic regulation can be implemented byconcatenating the output emotion vector from the previous evaluationstep onto the current input emotion vector and therefore expend thedimensionality of the Mapping Matrix.

The parameters of the Mapping Matrix may be changed in real time basedon the internal state of the agent or certain external stimuli. On topof linear Multiplication of the input emotion value by the MappingMatrix, nonlinearity can be introduced by cascading multiple MappingMatrices, where each weight Matrix defines the response from oneexternal stimuli (such as user's speech, user's facial expressionsetc.). In addition, a leaky integrator inspired by a biological neuronmodel can be used to manipulate the Mapping Matrix where the elements ofthe Mapping Matrix will update due to other external stimuli such asambient sound (e.g. loud noise). In this implementation, the externalstimuli are introduced as input voltages whereas the output voltagecorresponds to the Element values of the Mapping Matrix. When thestimuli go away, the Elements of the Mapping Matrix would return to theresting state. Neural networks or statistical regressions could also beused here where the Elements of the Mapping Matrices are trained on datacollected from human social interactions.

A user interface may be provide, enabling a user to adjust values of theagent. The user interface may allow the user to adjust parameters inreal-time and see the behaviour reflected in the agent in real-time. Anaudio-visual user interface may be provided for customizing the agentbased on a spoken conversation between a user and the agent.Accordingly, this aspect of the invention departs from the knownapproach to provide, and possibly overwhelm, the user with several userinterface control elements in the form of buttons, sliders and the like,to customize an agent. Instead, the described aspect of the inventionprovides an audio-visual interface which allows the user and the agentto conduct a spoken conversation during the customization process. Thisway, the user can be guided through the customization process by way ofthe agent conversing with the user. This enables the user to create thedesired affective/emotional/behavioural profile of the agent in a fasterand more intuitive manner, allowing the user to focus on creativitywithout being overwhelmed by a complex graphical user interface. Inother words, this aspect of the invention assists the user in performingthe technical task of generating a realistic, autonomously animatedagent by means of a continued and guided human-machine interactionprocess.

Certain embodiments use natural language processing (NLP) techniques tounderstand the intent of the user and to drive matrix configurationsthrough these intents. A combination of NLP and/or regular expressionmatching may be used to extract the user's matrix modification intent.The method may also display a selection of possible modifications todrive the discussion. The method advises the user when a requestedfeature modification is outside a defined range. The agent's questionsand responses to the user may be generated using NLP or other similartechniques.

When the user makes a customization request, the NLP model identifiesemotions and the corresponding personality modifications and triggers achange in at least one of the Affect Mapping Matrices whereby at leastone of the output Affects is changed. For example, if the user says: “Beless aggressive”,

The emotional dimension associated with “anger” may be identified, andthe linear multiplier of the output Affect anger from one or more of theAffect Mapping Matrices could be decreased. The script will then executethose orders and generate an agent with the corresponding Matrix valuesadjusted to reflect the user's request.

Certain embodiments have been described which provide an intuitive wayto customize a digital agent's behaviour by letting the user describethe features of the agent and the desired customization options. Certainembodiments provide a framework which guides the creative process in aninteractive manner, which makes the agent creation accessible tonon-professional communities.

In another aspect of the invention, the method comprises the step ofdetermining whether the customization request meets one or morecustomization constraints, and the step of customizing the agent inaccordance with the customization request if, preferably only if, theone or more customization constraints are met. Accordingly, this aspectensures Mapping Matrix values can be customized only within certainpredefined reasonable boundaries, which reduces the likelihood ofcreating inadequate or implausible animation/behaviour.

In one embodiment, the invention provides a computer-implemented methodfor automatically generating Affective expression of a virtualcharacter, digital entity, or robot, comprising: receiving input Affectactivations for at least one source, for a plurality of Affectcategories; using input

Affect activations and at least one Mapping Matrix in at least onematrix operation to generate output Affect activations, wherein elementsof the Mapping Matrix represent transformations to be applied tocorresponding input Affect Activations; and generating at least oneAffective expression for the virtual character, digital entity, orrobot, wherein the Affective expression is influenced by and/ordetermined by the output Affect activations. The non-zero elements of atleast one Affect Mapping Matrix may represent linear multipliers oninput Affect activations. Optionally, output Affect activations activateneurons of a neurobehavioural model of the virtual character, digitalentity, or robot to drive at least one Affective expression.

There may be provided at least two mapping matrices, wherein in a firstmatrix operation, a first mapping matrix is an affect mapping matrixrepresenting linear multipliers on input Affect activations, and in asecond mapping operation, a second mapping matrix is an intensitymapping matrix scaling the result of the first matrix operation. MappingMatrix elements may be variables, which may be set by a user. Variablesmay be set by a user conversing in natural language with the virtualcharacter, digital entity, or robot via an audio-visual interface.Variables may be trained via a machine learning method. Variables may beadjusted during live operation of the virtual character, digital entity,or robot.

In one embodiment, the Mapping Matrix is selected from a plurality ofpredefined Affect Mapping Matrixes, wherein each predefined AffectMapping Matrix is configured for a personality or temperament of thevirtual character, digital entity, or robot.

The Mapping Matrix may be an interpolation between a plurality ofpredefined Affect Mapping Matrixes, wherein each predefined AffectMapping Matrix is configured for a personality or temperament of thevirtual character, digital entity, or robot. The source of input Affectactivations may be at least one user interacting with the virtualcharacter, digital entity, or robot, or the virtual character, digitalentity, or robot itself. Input Affect activations may reflect Affectsdetected from one or more of: user utterances; user voice; user facialexpression; user body language and user environment. Affect activationsmay reflect Affects detected from one or more of: behavioural stateparameters; attentional parameters; conversation; and environment of thevirtual character, digital entity, or robot.

The methods and systems described may be utilised on any suitableelectronic computing system. According to the embodiments describedbelow, an electronic computing system utilises the methodology of theinvention using various modules and engines. The electronic computingsystem may include at least one processor, one or more memory devices oran interface for connection to one or more memory devices, input andoutput interfaces for connection to external devices in order to enablethe system to receive and operate upon instructions from one or moreusers or external systems, a data bus for internal and externalcommunications between the various components, and a suitable powersupply. Further, the electronic computing system may include one or morecommunication devices (wired or wireless) for communicating withexternal and internal devices, and one or more input/output devices,such as a display, pointing device, keyboard or printing device. Theprocessor is arranged to perform the steps of a program stored asprogram instructions within the memory device. The program instructionsenable the various methods of performing the invention as describedherein to be performed. The program instructions, may be developed orimplemented using any suitable software programming language andtoolkit, such as, for example, a C-based language and compiler. Further,the program instructions may be stored in any suitable manner such thatthey can be transferred to the memory device or read by the processor,such as, for example, being stored on a computer readable medium. Thecomputer readable medium may be any suitable medium for tangibly storingthe program instructions, such as, for example, solid state memory,magnetic tape, a compact disc (CD-ROM or CD-R/W), memory card, flashmemory, optical disc, magnetic disc or any other suitable computerreadable medium. The electronic computing system is arranged to be incommunication with data storage systems or devices (for example,external data storage systems or devices) in order to retrieve therelevant data. It will be understood that the system herein describedincludes one or more elements that are arranged to perform the variousfunctions and methods as described herein. The embodiments hereindescribed are aimed at providing the reader with examples of how variousmodules and/or engines that make up the elements of the system may beinterconnected to enable the functions to be implemented. Further, theembodiments of the description explain, in system related detail, howthe steps of the herein described method may be performed. Theconceptual diagrams are provided to indicate to the reader how thevarious data elements are processed at different stages by the variousdifferent modules and/or engines. It will be understood that thearrangement and construction of the modules or engines may be adaptedaccordingly depending on system and user requirements so that variousfunctions may be performed by different modules or engines to thosedescribed herein, and that certain modules or engines may be combinedinto single modules or engines. It will be understood that the modulesand/or engines described may be implemented and provided withinstructions using any suitable form of technology. For example, themodules or engines may be implemented or created using any suitablesoftware code written in any suitable language, where the code is thencompiled to produce an executable program that may be run on anysuitable computing system. Alternatively, or in conjunction with theexecutable program, the modules or engines may be implemented using, anysuitable mixture of hardware, firmware and software. For example,portions of the modules may be implemented using an application specificintegrated circuit (ASIC), a system-on-a-chip (SoC), field programmablegate arrays (FPGA) or any other suitable adaptable or programmableprocessing device. The methods described herein may be implemented usinga general-purpose computing system specifically programmed to performthe described steps. Alternatively, the methods described herein may beimplemented using a specific electronic computer system such as a datasorting and visualisation computer, a database query computer, agraphical analysis computer, a data analysis computer, a manufacturingdata analysis computer, a business intelligence computer, an artificialintelligence computer system etc., where the computer has beenspecifically adapted to perform the described steps on specific datacaptured from an environment associated with a particular field.

1-36. (canceled)
 37. A method of animating a virtual character ordigital entity, including the steps of: Receiving Input Text specifyingwords to be spoken by the virtual character or digital entity; for oneor more body parts of the virtual character or digital entity,determining a Pose to be applied to Input Text; determining an Action ofthe body parts to be applied to Input Text and generating at least onemotion of the virtual character or digital entity representing theAction being applied from the Pose.
 38. The method of claim 37 whereinthe pose is an arm pose and the method of determining a pose includesthe step of determining a horizontal distance between arms.
 39. Themethod of claim 37 wherein the pose is an arm pose and the method ofdetermining a pose includes the step of determining a vertical height ofone or more arms.
 40. The method of claim 37 wherein the at least onemotion is a beat gesture.
 41. The method of claim 37 wherein the step ofdetermining a pose includes the steps of: determining an Input Pose ofone or more body parts to be animated; determining a Variation Pose ofthe one or more body parts, the Variation Pose configured to blend withthe Input Pose; and determining a Blended Pose comprising a weightedinterpolation between the Input Pose and the Variation Pose.
 42. Asystem for animating a virtual character or digital entity, the systemincluding: an input-module, the input module receiving Input Text; adetermination module, the determination module determining: a Pose forone or more body parts of the virtual character or digital entity to beapplied to Input Text; and an Action of the body parts to be appliedfrom the Pose; and an output module, the output module generating atleast one motion of the virtual character or digital entity based on thePose and the Action.
 43. A method of animating a virtual character ordigital entity, including the steps of: determining an Input Pose of oneor more body parts to be animated; determining a Variation Pose of theone or more body parts, the Variation Pose configured to blend with theInput Pose; determining a Blended Pose comprising a weightedinterpolation between the Input Pose and the Variation Pose; andanimating the virtual character or digital entity using the BlendedPose.
 44. A system for animating a virtual character or digital entity,the system including: a determination module, the determination moduledetermining: an Input Pose to be animated; a Variation Poserepresentative of a Gesture, the Variation Pose configured to blend withthe Input Pose; a Blended Pose comprising a weighted interpolationbetween the Input Pose and the Variation Pose; and an animating moduleanimating the virtual character or digital entity using the BlendedPose.
 45. A method of animating a virtual character or digital entity,including the steps of: receiving Input Text specifying words to bespoken by the virtual character or digital entity; determining anemphasis score of each word in the Input Text; determining a set ofwords with relatively higher emphasis score compared to the remainingwords in the Input Text; and animating a virtual character or digitalentity to speak the Input Text and applying a gesture to each word fromthe set of words with relatively higher emphasis score.
 46. The methodof claim 45 wherein the gesture is applied to a stressed syllable ofeach word from the set of words with relatively higher emphasis score.47. The method of claim 45 wherein the gesture is a beat gesture. 48.The method of claim 45 wherein the emphasis score is based on wordrarity, wherein words with higher rarity have a higher emphasis score.49. The method of claim 45 wherein the set of words with relativelyhigher emphasis score comprises words from the Input Text with anemphasis score within a predefined top percentile of all words in theInput Text.
 50. The method of any one of claim 45 wherein the gestureapplied to each word from the set of words has a gesture amplitudeproportional or substantially proportional to the emphasis score of theword.
 51. The method of any one of claim 45 wherein the emphasis scoreis calculated by applying a set of criteria to determine the emphasisscore of each word, wherein a contribution of each criterion to theemphasis score is weighted using a weighting.
 52. The method of claim51, wherein the set of criteria are selected from the group consistingof: word sentiment, part of speech, capitalization, negation and rarity.53. A system for animating a virtual character or digital entity, thesystem including: input-receiving means for receiving Input Text; aplurality of gestures, wherein each gesture is associated with: ananimation; at least one configurable parameter for varying theanimation; and a configuration range of the configurable parameter; ananimation generator, wherein the animation generator is configured to:analyse Input Text to determine at least one gesture; determine aconfiguration of the configurable parameter from the configurationrange; and animate the virtual character or digital entity with theanimation as varied by the configurable parameter, wherein each gestureis associated with at least one modulatory variable for modulating theconfiguration range of the configuration parameter, and wherein theanimation generator is configured to determine a configuration of theconfigurable parameter as modified by the modulatory variable.
 54. Thesystem of claim 53, wherein determining a configuration of theconfigurable parameter from the configuration range is randomlydetermined.
 55. The system of claim 53, wherein configurable parametersare selected from the group consisting of: gesture speed, gestureamplitude and gesture pose.
 56. A method of animating a virtualcharacter or digital entity, including the steps of: configuring one ormore parameters of a set of gestures, the parameters being configuredsuch that the gestures reflect traits of the virtual character ordigital entity; and animating the virtual character or digital entitywith the gestures as configured by the one or more parameters.